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Abstract 

This paper begins with a discussion of integration over probabihty types (p-types). 
After doing that, the paper re- visits 3 mainstay problems of classical (non-quantum) 
Shannon Information Theory (SIT): source coding without distortion, channel coding, 
and source coding with distortion. The paper proves well-known, conventional results 
for each of these 3 problems. However, the proofs given for these results are not 
conventional. They are based on complex integration techniques (approximations 
obtained by applying the method of steepest descent to p-type integrals) instead of 
the usual delta & epsilon and typical sequences arguments. Another unconventional 
feature of this paper is that we make ample use of classical Bayesian networks (CB 
nets). This paper showcases some of the benefits of using CB nets to do classical SIT. 
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1 Introduction 



For a good textbook on classical (non-quantum) Shannon Information Theory (SIT), 
see, for example, Ref.[l] by Cover and Thomas. Henceforth we will refer to it as 
C&T. For a good textbook on classical (non-quantum) Bayesian Networks, see, for 
example, Ref. f2j by Koller and Friedman. 

This paper begins with a discussion of integration over probability types (p- 
types). After doing that, the paper re- visits 3 mainstay problems of classical SIT: 

• source coding (lossy compression) without distortion 

• channel coding 

• source coding with distortion 

The paper proves well-known, conventional results for each of these 3 problems. How- 
ever, the proofs given for these results are not conventional. They are based on 
complex integration techniques (approximations obtained by applying the method of 
steepest descent to p-type integrals) instead of the usual delta & epsilon and typical 
sequences arguments. 

Another unconventional feature of this paper is that we make ample use of 
classical Bayesian networks (CB nets). This paper showcases some of the benefits of 
using CB nets to do classical SIT. 

P-types were introduce into SIT by Csiszar and Korner (see Ref.p]). P-type 
integration is a natural, almost obvious consequence of the theory of p-types, although 
it is not spelled out explicitly in the book by Csiszar and Korner. In fact, all workers 
whose work I am familiar with, including Csiszar and Korner, use p-types frequently, 
but they do not use p-type integration. Instead, they use delta & epsilon and typical 
sequences arguments to bound some finite sums which are discrete approximations of 
p-type integrals. 

The conventional delta & epsilon arguments are more rigorous than the p- 
type integration arguments presented here. Although less rigorous than traditional 
arguments, p-type integration arguments have the virtue that they are easier to un- 
derstand and follow, especially by people who are not well versed in rigorous analysis. 
Such is the case with many physicists and engineers. A similar problem occurs when 
teaching Calculus. One can teach Calculus with the full panoply of delta & epsilon 
arguments from a textbook such as the legendary one by W. Rudin (Ref.jl]). Or one 
can teach Calculus at the level and scope of a college freshman course for engineers. 
Each approach appeals to a different audience and fulfils different needs. 

Most of our results are not exact. They are leading order terms in asymp- 
totic expansions for large n, where n is the number of letters in a codeword. These 
approximations become increasingly more accurate as n — )■ oo. 

This paper is almost self contained, although a few times we assume certain 
inequalities and send the reader to C&T for a proof of them. 
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2 Preliminaries and Notation 



In this section, we will describe some basic notation used throughout this paper. 

As usual, Z, M, C will denote the integers, real numbers, and complex numbers, 
respectively. We will sometimes add superscripts to these symbols to indicate subsets 
of these sets. For instance, we'll use M-° to denote the set of non-negative reals. For 
a, 6 e Z such that a < b, let Za^t = {c, a + l,a + 2, . . . ,b}. 

Let 6y = 6{x, y) denote the Kronecker delta function: it equals liix = y and 
if x 7^ I/. Let 9{S) denote the truth function: it equals 1 if statement S is true 
and otherwise. For example, 5'^ — 9{x — y). Another example is the step function 
9{x > 0): it equals 1 if x > and is zero otherwise. 

For any matrix M G C^^*^, M* will denote its complex conjugate, its 
transpose, and = M*^ its Hermitian conjugate. 

Random variables will be denoted by underhned letters; e.g., a. The (finite) 
set of values (states) that a can assume will be denoted by -S'a . Let Ng, — \Sa\- The 
probability that a = a will be denoted by P(a = a) or Pa{a), or simply by -P(a) 
if the latter will not lead to confusion in the context it is being used. We will use 
pd{S a) to denote the set of aU probabihty distributions with domain Sa. For joint 
random variables ( a , 6), let Sa^h = S a x Sb = {{a, b) : a G Sa,b G Sb}- 

Sometimes, when two random variables a(l) and a (2) satisfy -S'a(i) = Sa(^2), 
we will omit the indices (1) and (2) and refer to both random variables as a. We 
shall do this sometimes even if the random variables a (1) and a (2) are not identically 
distributed! This notation, if used with caution, does not lead to confusion and does 
avoid a lot of index clutter. 

Suppose {Px,y{x,y)}yx,y £ pd{Sx,y)- We will often use the expectation op- 
erators E,^ = ^.^Pix), Ex,y = I^x,yP(^^y)^ Ey\x = Ey-P(yl^)- Note that 
^x,y — ExEy^x- Let 

P(x : y) = -^^^^ . (1) 
^ ^' P{x)P{y) ^ ' 

Note that ExP{x : y) = EyP{x : y) = I. 

Suppose n is any positive integer. Let x" = {x^,X2,---,x^) be the random 
variable that takes one values = {xi, X2, ■ ■ ■ , Xn) £ -S"" . 

The rate of x is defined as Rx — ■ 



X " is said to be i.i.d. (independent, identically distributed) if = for all 
j G Zin and there is a F_j; G pd{Sx) such that P^n(,T") = YYj=i{Px{xj)}- When x'^ 
is i.i.d., we wiU sometimes use P^(a;") to denote the more correct expression Pxn{x"') 
and say that Px {x"') is an i.i.d. source. 

Suppose~{P(y"|a;")}v2;" G pdiS^^) for all x" G S^. P(y"|x") is said to be a 

discrete memoryless channel (DMC) if P(y"|x"') = YYj=iP{yj\^j)- 

We will use the following measures of various types of information (entropy) : 

• The (plain) entropy of the random variable x is defined in the classical case by 
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H{x) = EJn^ (2) 

which wc also call Hp^ (x), i7{P(x)}vx, and H{P^). This quantity measures 
the spread of Px ■ 

One can also consider plain entropy for a joint random variable x = {x^, Xg). 
For Px^,x2 ^ ^"^('^21,^2) '^i^h marginal probability distributions P^i and Px^i 
one defines a joint entropy H{x-^, = H{x) and partial entropies H{xi) 
and H{x^. 

The conditional entropy of 7/ given x is defined in the classical case by 



H{y\x) = ^^-.In-^ (3a) 
= H{y,x)-H{x), (3b) 

which we also call ifp^ ,y{y\x). This quantity measures the conditional spread 
of y given x. 

The Mutual Information (MI) of x and y is defined in the classical case by 



H{y:x) = Ex,y\nP{x : y) = ExEyP{x : |/)lnP(x : y) (4a) 
= H{x) + H{y)-H{y,x), (4b) 

which we also call Hp^ ^{y : x). This quantity measures the correlation be- 
tween X and y . 

• The Conditional Mutual Information (CMI, which can be read as "see me") of 
X and y given A is defined in the classical case by: 



P(x y\X) 

Hiy:x\X) = ^a.,.,A In ^^^^^j^ (5a) 
_ P(x,y,A)P(A) 

- ^-'^^V(.,A)P(y,A) ^'^^ 

= //(x|A) + i/(y|A)-i/(y,x|A) , (5c) 

which we also call i7p^ ^ ^ (|/ : a; | A ) . This quantity measures the conditional 

correlation of x and y given A . 
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• The relative information of P G pd{Sx) divided by Q G pd{Sx) is defined by 

D{P{x)//Q{x)U = E^(^)1^S^ ' (6) 
which we also call D{Px/ /Qx)- 

Note that we define entropies using natural logs. Our strategy is to use natural 
log entropies for all intermediate analytical calculations, and to convert to base-2 logs 
at the end of those calculations if a base-2 log numerical answer is desired. Such a 
conversion is of course trivial using loggo; = and ln2 = 0.6931 

We will use the following well-known integral representation of the Dirac delta 
function: 

^ e^^- . (7) 
We will also use the following integral representation of the step function: 



{k - te) ' 

for some e > 0. Eq.(|8]) follows because the integrand has a simple pole at A; = ie. Let 
k = kr + iki. If X > 0, the integrand goes to zero in the upper half of the {kr, ki) plane 
and it goes to infinity in the lower half plane, so we are forced to close the contour 
of integration in the upper half plane, which means the pole lies inside the contour. 
When X < 0, we are forced to close the contour in the lower half plane and thus the 
pole lies outside the contour. 

Suppose C{v) is a real valued function that depends in a continuous manner 
on N real variables v = {vj}jLi. The following variational operator can be applied 
to C{vy. 

The A^- dimensional Taylor expansion of C{v) about the point f = can be expressed 
as 



f{v) = /(O) + [6f{v)],=o + l^[S'f{v)].=o + +^[6'f{v)],=o + ... . (10) 



We will often use the following Taylor expansions: 



= e^i°^ = 1 + einx + ^(elnx)^ + . . . , (11) 
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and 



ln(l -\- x) — X h . . . ( converges if \x\ < 1) . (12) 

3 Integration Over P-types 

In this section, we will define integration over probability types (p-types). The set of 
p-types for a given n fills all of pd{Sx) in an increasingly finer way as n — >■ oo. Thus, 
once the density of p-types at each point of pd{S x) is known, we can integrate that 
density over a particular region R C pd{Sx) to get the number of p-types within R. 
Wc will define integration over p-types that depend on a single variable (univariate 
p-types), or multiple variables (multivariate p-types). We will also define integration 
over conditional p-types. Finally, we will define Dirac delta functions for integration 
over p-types. 

3.1 Integration Over Univariate P-type 

For any e 5"" , denote the number of occurrences oi x e Sx within by N{x\x'^). 
Hence 

n 

N{x\x'^) ^^e{xj ^x) . (13) 

i=i 

One can now say that two elements x" and x'" of S*" arc equivalent if, for all x E Sx, 
a;" and x'" both have the same number of occurrences of x. This equivalence relation 
partitions 5"" into equivalence classes given by, for any e -S" , 

[x"] = {x'" e SI : Ar(x|x") = N{x\x"')yx eSx} . (14) 
For each class [x"] and x e 5"^ , we can define 

Pm(x) = ^^^. (15) 
n 

Clearly, {P[xn]{x)}\/x £ pd{Sx)- We will refer to this probability distribution as a 
p-type. 

Note that if Q{x"') is an i.i.d. source, 

n 

so 
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Define the following integration operator: 

J PP[,„] = n { ^ dp^xn] (^) } ^ (j2 ^[-"] - 1 j ■ (IS) 

We will denote the number of elements in a class [x"'] by 

di,n]^\[x^]\. (19) 

Claim 1 

E = E • (20) 

a;" [a;"] 

proof: The classes [x"] are non-overlapping and they cover all of 5"" . 
QED 

Claim 2 For any G 5"" , 

= (ci[.n])i/=o e"^(^[-"i) , (21) 



where 



proof: Let 

and 



{d[xn])H=Q — W7~i — ~ • (22) 

(27rn)TyTL7VlR 



= {^(j) : J e ^i,iv J (23) 



rj^N{x{j)\xn (24) 



for all j e 2'i^7v^ . Note that J2'j=i — Recall Stirling's formula: 

n\ ^ V2^ n^e-'* (25) 

for n >> 1. Combinatorics gives a value for \[x'"]\ in terms of factorials. If we 
approximate those factorials using Stirling's formula, we get 



I [-11 - 77N^, (26a) 



1 

1 / n \ 2 

' ' -n+nlnn—^A—rj+rjlnrj} 



-1 



(27r)^ \rir2---rN, 
exp(-nEj^ln^) 



(26b) 
(26c) 
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QED 
Claim 3 

J2= [ VP^^.^n'^--' . (27) 
proof: For any i.i.d. source Q{x"), we have that 

1 = J^^ix^ (28a) 

= ^rf[^.n]e"^-^i-"i(^)''"<3(^) (28b) 

(28c) 



(27rn)^Vn.^[."iW' 
where AV^ is yet to be determined and 

C, = nY,Pi.^ix)\n-^ (29) 

We add to £o a Lagrange muhipher term that constrains the components of the vector 
{P[a;n](x)}vx so that they sum to one: 



(30) 



for any A G M. Our goal is to approximate the integral Eq. (l28cP using the method 
of steepest descent. We just want to get the leading order term in an asymptotic 
expansion of the integral for large n. To get this leading order term, it is sufficient to 
approximate £ to second order in 5P[^n](x), about the point (or points) that have a 
vanishing first variation 5C Thus, approximate 

C^C + 5C + ]^5^C , (31) 

where quantities with a tilde over them are evaluated at a tilde (saddle) point that 
satisfies 

(5£ = 0. (32) 

It's easy to check that 

'g(x)e-i+^~ 



5£ = nV5P[.„](a:)ln 



(33) 
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and 



5^C = -nY, 



Punl fx) 



(34) 



Next, for each x, we set to zero the coefficient of ^Pi^^njfx) in 5C After doing that, 
we enforce the constraint that P[^n](x) = 1. This leads us to conclude that 



P[x"\{x) = Q{x) . 



Using this value of P[a,.n](x), we get 



£ = 



and 



5^C = -nY, 



Q{x) 



From Eq.f l28cp . we get 



(35) 



(36) 



(37) 



(38) 



where 



DPunie 2Q(.) 



TT 



^ Tlx I 2Q{x) ^ 



(39) 



The final integral was performed using Eg. (11981) . This implies 1/ AV = n 
QED 

Note that Eqs.([27D and ffTMD imply that 



El 



n 



(iVx - 1)! 

SO the number of p-types with a given n in pd{Sx) varies polynomial with n. 



(40) 



3.2 Integration Over Multivariate P-types 

There exists a very natural 1-1 onto map from 5*^ x to {S^ x Sy )", namely the one 



that identifies {xj)\fj{yj)\fj with 



Xj 



Thus, the definitions and claims given in 



the previous section for N{x\x^), [x^], P[x"]{x) and J PP^^nj generalize very naturally 
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to give analogous definitions and claims for N{x,y\x"',y"'), [x'",y^], P[xn^yn]{x,y) and 
J 'DP^^j;n ynj. For example, 



We will sometimes use [ ] as an abbreviation for a class. For example, we 
might abbreviate P[ {a, h, c) by P[]{a, b, c). 

Note that when y" = x" in P^^n^yu-^, 

P[xr^,xr^]{x, y) = 51 P[:,n](x) . (42) 

Note also that we can express as follows 

e"E..,^'[x".yn](^,2/)in<5^ ^ / 0' if y) such that 5^ = and P[^n,^n](x, y) ^ 0^^ , 

1^ 1, otherwise 

= ^(V(x, y):y^x^ P[.n,yn]{x, y) = 0) (43b) 
= 5^:. (43c) 

3.3 Integration Over Conditional P-types 

For any e -S" and y'^ E Sy, define conditional classes by 

[ - ^ Ar(x,y|x'*,y") >J^Ar(x,y|x"^,y'") 

(44) 

and conditional probability types by 

for all X & and y E Sy. 

We will sometimes use [ ] as an abbreviation for a conditional class. For 
example, we might abbreviate P[a",b"\c",d"]{0',b\c, d) by P[]{a,b\c,d). 

Define the following integration operator: 



(46) 



J VPlyU^^n^ " n (iP[j,n|^„](y|x)| Yl (^Ply-\xn]{y\x) - 1 

We will denote the number of elements in conditional class [y"|a;"] by 

(47) 



10 



Claim 4 



(4^ 



proof: For any DMC Q^y^lx"-), we must have 



(49) 



1 = XI ^fe"k-]<5(2/"|a;") ■ 

If (^(x") is an i.i.d source and Q{x^,y^) = Q{y'^\x'^)Q{x"'), tlien tlie last equation 
implies 



1 = '^dl^n]Q{x'') rf[j,n|^n]Q(?/"|x") 



[x"] [y"\x"] 



But also 



1= 5^Q(x",y"). 

Since y") is an arbitrary i.i.d. source, the claim follows. 

QED 



Claim 5 



proof: Combinatorics? 
QED 

Claim 6 



d\ 



EE = E ■ 

[x"] [y"\x"] [x",y"] 



(50a) 
(50b) 



(51) 



(52) 



(53) 

J2\x",v"] dlx",y'^]- 



proof: This follows from Claims H] and [5] and the fact that y„ ^[x" y" 
QED 

Alternatively, one could prove Claim [6] by combinatorics and then prove Claim 
|5]from Claims m and ini 



Claim 7 



'^P[x",y" 



(54) 
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proof: Let LHS and RHS denote the left hand side and right hand side of Eq. flS^ . 

Recall that Dirac delta functions obey 6{ax) = This proof hinges on 

that simple identity. 

Define 



and 



(55) 



x,y 



Then 



n|~nl [ll\X 



n ME^i 



(56) 



LHS 



VL1VL2 



^1 n { ^ c?P[^n (a;, y) I JJ 1 5 {x,y)- (a;) j 

JJ I ^ rfP[^n,yn] (x, ?/) | 5 ^ ^ P[, 



.](X,?/) - 1 



(57a) 
;57b) 

(57c) 
(57d) 



This works because LHS has Ui = + N^Ny integrals and ns = + 1 
delta functions, for a total of rii — ns = N^Ny — 1 degrees of freedom. RHS has 
N^Ny integrals and one delta function for the same total of N^Ny — 1 degrees of 
freedom. 
QED 



Claim 8 



E 

[y"\x"] 



n 



/T)P, , il'nPr ^g-m.\NxNy-Nx 



(58a) 
(58b) 



where 



p g.m. 



(59) 



is the geometric mean of Pu 



12 



proof: Substitute 



'—E' (60) 



and 



^1 E (61) 



[x",y"] 

into Eq.( !54f) and then compare the result with Eq.( !53l) . 
QED 

3.4 Dirac Delta Functions For P-type Integration 

One occasionally finds it useful to use Dirac delta functions for p-type integration. 
Suppose x",?/" G 5" and e is a real number satisfying < e << 1. Let X = [x"] and 
y = [y'']. Define 

V; = " - ;L (62) 

for any positive real number a. We will refer to the following functions as Dirac delta 
functions for setting X and 3^ equal 

s{x,y) = e{x = y) , (63) 



and 



Claim 9 



and 



6,{X, y) = exp I -- > ^{P^ix) - Pyix)y I , (64) 
^.(x",l/") = 4^|^, (65) 



^AP.-Py) = ''^. (66) 



5^5,(x",y") = l, (67) 



y VPx 6,{Px -Py) = l. (68) 



proof: This follows from integration formula Eq. OlQSp . 
QED 
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4 Source Coding (Lossy Compression) 

We consider all source coding protocols that can be described by the following CB 
net 



with St = and 



P(x") = n^x(x,), (70) 
P(m|x") = 5(m,m(x")) (71) 



and 



P(x"|m) = 5(x",x"(m)) . (72) 

Assume that we are given a source P^ G pd{Sx)- The encoding function m(-) and 
the decoding function x"(-) are yet to "be specified^ 
The probability of error is defined by 

Pe„ = P(i V ai") • (73) 
We find it more convenient to work with the probability of success, which is defined 
by Psuc = 1 - Perr- One has 

= P(J" = x"") (74b) 
= ^ ^(x" = a;")P(x"|m)P(m|x")P^(x") (74c) 

= ^P^(x")5[x",x"om(a;")] . (74d) 

Now it's time to decide what encoding and decoding functions we want to 
consider. Suppose A is a proper subset of 5*" . One can give each element of A an 
individual number (its index) from 1 to \A\. Assume, without loss of generality, that 
0" ^ A. As we shall see, the following encoding and decoding functions are good 
enough: 

^(^n^ _ / index of in A , if x" G A , . 



^Many authors (for instance, C&T) denote the encoding function m(-) by /(•) and the decoding 
function x^{-) by g{-). 
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and 



^ ^ 1^ 0" , if m = ^ ^ 

where the set ^ is given by either 



x-i?>;^P[.n](x)ln^| , (77) 



or 



Am. = {a:" : R > H{P^,n])} = I : i? > V P[,n](a;) In — i 



(7J 



for some positive number R yet to be specified. These two interesting options for the 
set A can be considered simultaneously by defining 



(79) 



where 



\ _ j Px{x) , source dependent coding , . 

\ -P[a;"](3^) , universal coding ' ^ ^ 

In the case of source dependent coding, Q (and therefore the functions m(-) and 
x"(-)) depend on the source distribution P^. In the case of universal coding, Q is 
independent of the source. 

Note that for this encoding and decoding functions, 

S[x^, o mix'')] = ^(x" &A)^e(^R>J2 ^m(^) In ^ j (81) 
for all x"" eSl- {0"} so 

> 5^ P[.„](x) In ^ j (82b) 



ED / M (x) 



e{R>H{P^)). (82c) 
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Eq.f l82cl) follows because, as is easily proven, applying the method of steepest descent 
to the p-type integral yields a tilde point: 

=P.(x) . (83) 
As mentioned in the notation section, we define Rm by 

fl„ = . (84) 

~ n 

So far, it's not clear what value to use for the constant R that appears in the definition 
of set A. In the next Claim, we will show that it must equal Rm for our arguments 
to be valid. 

Claim 10 

R = Rjn (85) 

for consistency of our arguments. 
proof: We must have 

^rn = ^^(a;"GA) (86a) 
~ |pP[.„]e"^^^'^"'^^'^^^W^^(^i?>X:PM(a:)ln (86b) 
e-^ I PP,.,e"^^"'-^'^^^" W^^^ (^R>J2 ^m(-) In ^) (86c) 



~ e"^^(i? > if(P^)) . (86d) 

As long as R > H{x), our approximations are valid and = e"^^. 
QED 

5 Channel Coding 

We define a codebook C as an A^^ x n matrix given by C = {x"'(m)}vm = a;"(-) where 
x"(m) G for all m G 

We consider all channel coding protocols that can be described by the following 

CB net 




(m , (87) 
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with 



P{m) 



and 



P(a;"|m,C) = a;"(m)) , 



P(C) = to be specified , 



(89) 
(90) 

(91) 



P(m|y", C) = to be specified . (92) 

Assume that we are given a channel {Py\x{y\^)}'\/y ^ pd{Sy) for all x & S^- The 
encoding P{C) and decoding P(m|y",C) probability distributions are yet to be spec- 
ified. 

It's convenient to define the coding rate Rm by 



P m. — 



In AT, 



n 



and the channel capacity C by 



C = max H( y : x) . 



(93) 
(94) 



Claim 11 (Independence upper bound for mutual information of DMC) //P(y"|x" 
YYj=i^{yj\^j) ('^h'i'^ what is called a discrete memoryless channel, DMC), then 



H{y_-:x-)<Y,H{y^--^j)- 

j=0 

Furthermore, equality holds iff the Xj are mutually independent. 



(95) 



proof: Assume n = 3 for illustrative purposes. If the Xj are not independent, we 
must consider the following CB net 





(96) 
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If the X, are independent, then this becomes 



(97) 



In the case of Eq. fl96l) , 

H{y^:x^) = H{y^)-H{y^\x^)=H{y^)-J2H{y^\x,) (98a) 

j 

= E^(^r ^.■) (98c) 



Eq. (l98bp follows from the "subadditivity" or "independence upper bound" of the joint 
entropy, which says that H{a, b) < H{a) + H{b) for any random variables a and b. 
(See C&T for a proof of subadditivity). If the Xj are mutually independent, then the 
y ^ must be mutually independent too, in which case Eq. ( I98b|) becomes an equality. 
Conversely, if Eq. f l98b|) is an equality, then the y . must be mutually independent so 
the Xj must be too. 
QED 

Claim 12 Optimality: WRm , if 3 an encoding and a decoding that satisfy lim„^oo Perr ■ 
for the CB net of Eq.^, then < C. 



proof: 



nRrn = \nNrn=H{m) = H{y'' : m) + H{m\y'') 

< H{y'^ : m) + n5 

< H{y'' : x") +n5 

n 

< n{C + 6) 



(99a) 
(99b) 
(99c) 

(99d) 
(99e) 



(]99bp : This follows from Fano's inequality. (See C&T for a proof of Fano's inequality.) 
5 is some positive number that tends to zero as n ^ 00 
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fl99cp : This follows from the data processing inequalities. (See C&T for a proof of the 
data processing inequalities.) 

( I99d|) : This follows from Claim [TTl 

(I99ep : This follows from the definition of channel capacity C. 



QED 

Claim 13 Achiev ability: ^Rm, if Rm < C, then 3 an encoding and a decoding that 
satisfy lim^^oo Perr = for the CB net of Eq. ( [g^ . 

proof: So far, the encoding and decoding probability distributions are unspecified. 
In this proof, we will use one possible choice for these distributions. This choice, 
although not very practical, turns out to yield optimal results. For P{C) we choose 
what is called random coding: 

p(c) = p.(x"(-)) = n^-(^"M) = n^^(^iM) (100) 

m m,j 

for some source G pd{Sx)- For P{m\y^,C) we choose a maximum likelihood 
decoder j§ 



P(m|y^C) = T] e(R<-ln^^}pP^] (101a) 
- n^f^<-lny^""nn (101b) 



rri^m 



for some P > 0. Note that there is no guarantee that this definition of P(m|?/",C) 
is a well defined probability distribution satisfying '^f^P{'fh\y",C) = 1. In the next 
Claim, we will prove that if P = Rm, then P(m|?/",C) is well defined. 
The probability of error is defined by 



Perr = Pirn ^ m) . (102) 



^By Um^fh we mean Urnes^-{m}- 
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We find it more convenient to work with the probabihty of success, which is defined 
by Psuc = 1 - Perr- One has 



1 — P 

P{'m_ = m) 

= m)P{m, m) 



(103a) 
(103b) 

(103c) 



e{m = Tn)P{m\y'', C)P(?/"|x")5(x", x"(m))P(m)P(CXl03d) 
^ E E ^(^) E ^(^1^"' C)P(2/1a;'^(m)) • 



(103e) 



The choice of m G Sm does not matter. Any choice would give the same 



answer for P., 



(104) 



Thus 



p.. ^ E pmn)) n (fi < i in :::;:;; ) . dos) 



Let 



n 



dk{m) 1 
2TTi {k{m) — it) j ' 



and 



K =Y^ k{ni) . 

Expressing the 6 functions in Eq. OlOSp as integrals (see Eq.(jH])), we get 



(106) 



(107) 



1 e ^ exp 



n 



P[](l/,x(-))lnZ(y,x(-)) 



J 



where 



P* n [y: x[m)) 



(109) 
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Next we express the sum over y"',x"'{-) as a p-type integral to get 



-iKR 



fc(-) 



I)P[]n^ii+^^^^-i(rf[,n,,.(.)])j^=oe^° , (110) 



where 



Co = n^ P[]{y,x{-))\n 



(111) 



We add to Co a Lagrange multipher term that constrains the components of the vector 
{P[](?/, x(-))}vj/,a;(.) SO that they sum to one: 



C = Cx = Co + nX \ ^P[](y,x(-))-l 
for any A G M. It's easy to check that C is maximized when 



(112) 



(113) 



Evaluating the integrand of the p-type integral in Eq. OllOp at this tilde point yields 



where 



Z=J2ziy,xi-)). 



Using the shorthand notations 



y x(m) 



(114) 



(115) 



(116) 



Z can be expressed as 



Z = En 



Define 



E^(rn)[P'+'^{y.x{m))] W |P,(^)[p-*-^(|/:x(m))] 

m^fh 



= [Z]k{m)=0\fm = EyE^(^fa)[P^^''^r. [y ; x{m))] . 



(117) 



(118) 



21 



Note that 1 equals 



/ + CXD 

= r^dK ^ e^Ki:™/^W-)}-^) . (119b) 

J ~oo J —oD 27r 

Multiplying Pguc by 1 certainly doesn't change it. Thus the right hand sides of 
Eqs.f lll4j) and flllQbl) can be multiplied to get 



sue 



/+°° dh /'+°° r 
— / rfi^ e^^^-'^-^) i e'hj:^^^Hm)^nlnZ _ ^^20) 
-oo 27r /fc(-) 

Next we will assume that, for all m, when doing the contour integration over 
k{m) in Eq. fll20p with Z given by Eq. flll7p . the e"'"^^ can be evaluated at the value 
k{m) = ie — of the polejj Symbolically, this means we assume 

i QihJ2^_,f^k(m)^nlnZ ^ ^nlnZo I ^ihY^rr^^f^K^) (121a) 
7fe(.) 7fc{.) 

= e"^^^o^(/i > 0) . (121b) 



Applying Eq. ffT2lb|) to Eq.(II2n]) gives 

/ + 00 11 P + OO 

—9{h > 0) / dK e'^^-h-R)^n\r.z, _ (^22) 
-oo 27r J— oo 

Next we use Eqs. (fTTl) and ( fT2i) to expand In to second order in K. This 

yields 



where 



and 



K 



a = H(y:x), (124) 



6 = EyE^P{y:x)\n'^P{y:x)-H\y:x) (125a) 
= Ej,,,ln2p(y:x)-[E,,,lnP(|/:x)]2 (125b) 
> (125c) 



■^I don't know how to prove this assumption rigorously. The assumption is plausible, and it does 
lead to the correct result for the channel capacity. It may just be an approximation that becomes 
increasingly good as n — >■ c» 
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(The inequality follows from the identity (x^) — (x)^ = {{x — (x))^) where (■) 
denotes an average and x is any random variable.) 

With the In Zq expanded to second order in K, Eq. fll22p becomes 



f- 6{h > 0) / dK e^^(-'^~«)-fr^ . (126) 

If we keep only the term linear in K in the argument of the exponential, we immedi- 
ately get 

Psuc = e{R<H{y : x)) . (127) 
If we also keep the term quadratic in K, we get 

Psuc = ^erfc (^^[^ -H{y-- ■ (128) 

Maximizing both sides of Eq. (11271) with respect to the source P^, and using 
the definition of channel capacity C, we get that there is an encoding and a decoding 
for which 

Psuc = 0{R < C) . (129) 

QED 
Claim 14 

R = Rrn (130) 

for consistency of our arguments. 

proof: Rather than checking that -P(^|l/"', C) = 1, we will check that the total 
probability distribution for the whole CB net Eq.( l8711 sums to one. We want 



Using 



1= J2 P(m|?/",C)P(?/"|x")5(x",x"(m))P(m)P(C) . (131) 



^ = ^^(m = m) + ^^(m^m) , (132) 



rn,m rn,m m.m 



and 



f jV^ - N ) 

J2 + rn)P{m) m = ^ ^ ^(^) ^ ^ 111^0 , (133) 
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we get for any pair mo, rh G Sm such that mo 7^ m, 

1 = Psuc + NrnEcY, ^(^b", C)P(y"|x"(mo)) . 



(134) 



Substituting into Eq. fll34p the specific values of the probabihty distributions P(m|?/" , C) 
and P(?/"|x"(mo)), we get 



err ' m 



dh 
^ 2^ 



(135) 



where ^^^^ is defined as before (see Eq. fll06p ) and where 



W = E,, 



n 



. fc{m) 



Ex(m)[P * " {y ■■ x{m))] 



Let 



W^o = [W]k(m)=owm = EyEx(^fh)[P' " {y ■■ x{m))] . 
Next assume that 



fc(.) 



n In Wo 



^n\nW„QQ^ > 0) . 



fc(.) 



Applying Eq. (fT38bD to Eq. ffT35|) yields 



err ^ " m 



Nr. 



dh 
2^ 



e{h>0) / rfir e'^(-'^-^)e"^°^° 



(136) 

(137) 

(138a) 
(138b) 

(139) 

(140) 

Note that this change of variables changes Wq defined by Eq.f ll37p to Zq defined by 
Eq.f lll8p . Under this change of variables, Eq.f ll39p becomes 

/ + OO TT /• + OO 

— ^(/i > 0)e"(-''-^) / rfire*^(-'*-^)e"''^^° (141a) 
00 277 J —00 

^ Nrne-^'^'Psuc, (141b) 

or, equivalently, 

e{R > H{y : x)) ^ Nrne-''''e{R < H{y : x)) . (142) 

Thus, when R equals (or is very close to) H{y : x), we must have Nm = e""^. 
QED 



Now we can make the following change of variables 



K K — in . 
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6 Source Coding With Distortion 



Assume that we are given a function d{x, y) that measures the distance between two 
letters ol x,y & S x- Assume d{x, x) = and d{x, I/) > for all x,y E Sx- 

Assume that random variables x and x both have the same set of possible 
values Sx- We define codebook C as an Nm x n matrix given by C = {a;"(m)}vm = 
where x"'{m) G S*" for all m G Sm- We define another codebook C as an xn- 

matrix given by C = {x^{m)}\fm = x^{-) where x^{m) G S*" for all m G Sm- 

We consider all source coding protocols that can be described by the following 
CB net: 




with Sx — and 

n 



and 







(144) 


P{m\x'',C] 


1 = to be specified , 


(145) 




to be specified , 


(146) 


P{C\C) = llPxi^{x^{m)\ 

m 


a;"(m)) = YlPx\x{xj{m)\xj{m)) , 


(147) 


P{x^\m,C 


')^S{x^,x^{m)) . 


(148) 



Assume that we are given a source {P^ (x)}vx &pd{Sx) and achannel {Pj|^(x|x)}vje5x ^ 
pd{Sx) for all x e Sx - The encoding P(m|x",C) and decoding P(C) probability dis- 
tributions arc yet to be specified. 

Henceforth, we will use the following shorthand notations 

1 " 

Ej^-J2, E^,x = J2Px\x{x\x)Px{x). (149) 

As usual, we define the rate of m by Rm = ln{Nrn)/n. We define the 
probability of success by 

Psuc = P[Ejd{x^,x^)<D] (150) 
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where D E ]R-^° is called the distortion. Note that when D = 0, Pguc = P{x_"' = 
x"), which is what we used previously when we considered source coding without 
distortion. 

For any source and distortion D, it is useful to define a rate distortion 
function (D) by 

H.iD) = min Hp^^^pAI : ^) ■ (151) 

Px\x-Es,x<i(x,x)<D 

Claim 15 (Properties of Hx{D)) 

(a) Hx{D) is a monotonically non-increasing, convex function of D. 

(b) H^{0) = H{x) 

(c) H^{E9^d{x,x)) < Hq{x_ : x), where E^.^ = Y.^^^Q(x,x), where {Q(x,x)}^^^^^ e 
pd{Sg^x) such that '^^Qixjx) = Px{x) for all x. 

proof: 

proof of (a): Monotonicity is obvious. To prove convexity, recall (see C&T 
for a proof) that the mutual information is a convex function of its joint probability. 
This means that for any A G [0, 1] and Pi, Pq ^ pd{S^^x), if 

Px{x, x) = XPi{x, x) + (1 - X)Po{x, x) (152) 

for all X, X, then 

Hp^ix, : x) < XHp,{x : x) + (1 - \)Hp^{x : x) . (153) 
For any A G [0, 1], let Dq, Di G M-° and 

Dx = \Di + (1 - \)D^ . (154) 
Suppose Pq.Pi G pd{Sx,x) such that J^x-^ji^^^) = Px{x) for all x and 

Hx{Dj) = Hp^{^: x) (155) 
for j = 0, 1. Define Pa by Eq.([l52]). Then 

Hx{Dx) < Hp^{x:x) (156a) 
< XHp^ix, : x) + (1 - A)/fp„(x : x) (156b) 
= XHx{D,) + {l-X)Hx{Do) . (156c) 

proof of (6): If D = 0, then P(x|x) = 6^ so i?(x : x ) = H{x). 
proof of (c): This follows from definition of Hx (-D). 

QED 
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Claim 16 Optimality: ^{D, Rm), fl?^ encoding and a decoding that satisfy lim„_ 
for the CB net of Eq.^TJ^, then R^n > H^{D). 

proof: 



nRrn = \nNrn=H{m) = H{x_'^ : m) + H{rn\x,'^) (157a) 

> : m) (157b) 

> /7(x" : x"") (157c) 

> ^-f/'x {E^^,x,d{xj,Xj)) (157e) 

> nif^; {^'Y^E^^^^A{xj,Xj)^ (157f) 

= nH,{E^^,d{x,x)) (157g) 

> nH^{D) (157h) 

fll57cl) : This follows from the data processing inequalities. (See C&T for a proof of the 
data processing inequalities.) 

(]157dl) : This follows from Claim [11] in the case of equality. We are assuming that P{C\C) 
is a DMC, and that P{C) is an i.i.d. source. This forces {xj{m)^Xj{m)) and 
{xji{m),Xji{m)) with j ^ j' to be independent. 

fll57el) : This follows from Claim [T5l part (c). 

(115711) : This follows because H^(D) is a convex function of D. 

( |157gP : This follows from using P[](x, x) — )■ P{x,x). 



►oo err 



( ]157h[) : Eq.f ll50p is the definition of D. Expressing Eq.f ll50p in terms of p- types and 
using P[]{x, x) — )■ P(x, x), we find that E^^^dix, x) < D is necessary for success. 
Then use the fact that (D) is non-increasing. 



QED 

Claim 17 Achievability: \/{D, Rm) , if Rm > H^iD), then 3 an encoding and a 
decoding that satisfy lim„_j.oo -Perr = for the CB net of Eg. [Ij^ - 
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proof: So far, the encoding and decoding probability distributions are unspecified. 
In this proof, we will use one possible choice for these distributions. For decoder P{C) 
we choose: 

PiC) = P^{x-{-)) = l[{P.^{x-im))} = l[{P^ix,{m))} , (158) 
and for encoder P(m|x",C) we choose: 

P{m\x\C) = Y\ 0(R>-^^^^^^^Wh^) (159a) 
= T\e(R>Un^^^^^\ (169b) 

for some > 0. Note that there is no guarantee that this definition of P(m|x",C) 

is a well defined probability distribution satisfying ^.fnP{jn\x^ ,C) = 1. In the next 
Claim, we will prove that if i? = Rm, then P(m|x",C) is well defined. 
Let ~ 

P(C) = 5^P(C|C)P(C). (160) 
c 

One has 

Psuc = P[E,d{xj,x^)<D] (161a) 

= J2 P{x^,x'')e{Ejd{xj,Xj) < D) (161b) 

P(x"|m,C)P(m|x",C)P(x")P(C)^(Ejd(%,xj) < D) (161c) 

x" ,x" ,m,C 

= Y E^E^uP{m\x'' ,C)9{Ejd{xj{m), Xj) < D) . (161d) 

m 

Consider what happens to P(m|x"',C) in Eq. fll61d|) as — t- 0. When D — )■ 

0, x"(m) — x" by virtue of Eq. fll61dp . Hence P(x"'|x"(m)) — 1. Furthermore, 

P(a;"|x"(m')) P(x"(m)|x"(m')) = P(a;"(m))5™' = P(x")5;;j'. Thus 

P(m|x^ C) ^ ^ ("P > i In -pi-] = 0(x'^ G J . (162) 



Hence, when D = 0, the encoder P(m|x",C) in Eq. (]161d|) is the same as the one we 



used when we considered source coding without distortion. 

For any Q G pd{S^^x) such that '^^xQi^i^) = Px{x) for all x, define 

OQix,x) = Oq,^ = e{J2 Qi^, ^)dix, x)<D). (163) 
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Note that 



Note that 



6{Ejd{xj{l),Xj) < D) = 6'pjj(5;(i),^) 

YEr = NrnEc 



m 



(164) 
(165) 



Hence, the choice of m G Sm in Eq.f ll61d|) does not matter. Any choice would give 
the same answer for Pguc- Thus, Eq. (ll61d|) can be replaced by the following. Assume 
1 G 5*^, and replace m by 1 and m' by m. Also use Eq. fll64p . Then 



P,„. = NrnE^E.,^ \[ld[R>-\Yi ) j> ^Pn{x(i),.) • (166a) 



n P{x"' : x"(m)) 



If we assume that our formalism will eventually justify the physically plausible 
assumption that P[](x(l), x) — )■ Px,x {x{l),x), then we may replace 9p^^(^x{i)^x) by 9p^ 
at this point. This would simplify the analysis below. Instead, we will continue with 
^P[](x(i),x) and show that our formalism does indeed lead to the same result as if we 
had replaced Op^^^x(i)^x) by 9p^ ^ at this point. 

Let 

^ dk{m) 1 



fc(-) 



n 



27ri {k{m) — it) 



(167) 



and 



K = ^ k{m) . ( 
Expressing the 9 functions in Eq. fll66al) as integrals (see Eq.([8])), we get 



168) 



P = N 

^ sue ' m 



where 



'rn i e'^^ ^ exp 



( 



\ 



n 



E 



P\\(x(-\,x)\iyZ{x(-\,x\ 
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P[]ix{l),x) 



(169) 



P ' n {x: x{m)) 



(170) 



Next we express the sum over x"(-),x" as a p-type integral to get 
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where 



x{-),x 



P[](x{-),X) 



(172) 



We add to Cq a Lagrange multiplier term that constrains the components of the vector 
{P[](x(-), a;)}vx( ),a; so that they sum to one: 



for any A G M. It's easy to check that C is maximized when 



(173) 



Pl]{x{-),x) 



Z(x{-),x) 



Ex{.),xZixi-),x) 



(174) 



Evaluating the integrand of the p-type integral in Eq.f ll7ip at this tilde point yields 



where 



p — AT A. AKR n\nZ n__ 

lk{-) 



x{-),x 



Z can be expressed as 



Z = 



Define 



E^^)[P-^^{X{1) : x)] J] {Pj(^)[P^^(J(m) : x)] 



(175) 



(176) 



(177) 



Zo = [Z]k{m)=0\/m = E^E^{i)[P (x(l) : x)] . 

Note that 1 equals 



(178) 



/ + 00 
dK 5C^{k{m)} - K) 



oo 

+ 00 



dK 



m^l 



(179a) 



(179b) 



Multiplying Pg^c by 1 certainly doesn't change it. Thus the right hand sides of 
Eqs.f ll75l) and (]179bp can be multiplied to get 
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Next we will assume that, for all m, when doing the contour integration over k{m) in 
Eq. fllSOp with Z given by Eq. fll77p . the e"'°^6'p (^j^-^-) can be evaluated at the value 

k{m) = — > of the polej^ Symbolically, this means we assume 



= e"'"^°0 , ,^(/i > 0) . (181b) 

Applying Eq. lfTsTbll to Eq. dTSOjl gives 

/+00 7, p + QO 

Next we make the following change of variables: 

K ^ K + in. (183) 

Let 

Wo = [Zo]K^K+^n = E^E^^,) [P^'^f (x(l) : x)] . (184) 
Under this change of variables, Eq. (ll82p becomes 

/.+00 77 /. + 00 

Psuc = / ^9{h > 0)e-<-^^^) / dK e^^(-'^+^)e"^-^«^ ^ . 

(185) 

Next we use Eqs. ffTT]) and (1121) to expand InVFo to second order in K. This 

yields 



K 

— a 

n 2n 

where 



InWo ^ -i—a - —b , (186) 



a = H{x : x) , (187) 



■*I don't know how to prove this assumption rigorously. The assumption is plausible, and it does 
lead to the correct result for the channel capacity. It may just be an approximation that becomes 
increasingly good as n oo 
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and 



b = E^E^P{x : x)\n'^ P{x : x) - H^{^ : x) (188a) 
= E^^Jn^P{x:x)-[E^,JnP{x:x)]^ (188b) 
> . (188c) 

With the InWn expanded to second order in K, and 9 , , , to zeroth 

P n (x{l),X) 

order in K, Eq.f ll85p becomes 

/ + 00 77 P + OO 2 

^ 9{h > 0)e"('^~«) / dK e^Ki-a-H+R)-^, ^^gg) 

If we keep only the term hnear in K in the argument of the exponential, we immedi- 
ately get 

Psuc ~ ep^^Nrne-'"'e{R > a) ^ iV„e-"^^(i? > H{x : x)) . (190) 

Minimizing both sides of Eq. (I190p with respect to the channel P^\x and using 
the definition of the rate distortion function (D), we get that there is an encoding 
and a decoding for which 

Psuc = iV^e-"^^(i? > H^{D)) . (191) 

QED 
Claim 18 

R = Rrn (192) 

for consistency of our arguments. 

proof: For consistency, must have A^^e""^ = 1 in Eq.f ll9ip . 
QED 

A Appendix: Some Integrals Over Polytopes 

This appendix is a collection of integration formulas for doing integrals over polytope 
shaped regions. These formulas are useful for doing p-type integrations. 

The standard polytope is defined as the set A" = {(to)^i; ■ ■ ■ ,tn) '■ + + 
... +t„ = l,tj > for all j}. 

For {Px}\fx G pd{Sx), we define the following integration operator: 
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/ ^^^=n{/ dP^'^sl^P^-?! . (193) 

This is the same definition as Eg. (1181) . except for an arbitrary vector {Px}\/x instead 
of just for a p-type {P[:r"](a;)}vx- 

It is well known and easy to show by induction that 

' - ■ ''''' 

More generally, the so called Dirichlet integral, defined by 

In = TT s / dxj x°j^ ^ > / (1x^5 I y^a^j — 1 I (195a) 

= TTw dXj x]''^\e{y2^j (195b) 
can be showijfl to be equal to 

= -p r 5 (196) 

r(Ei=i«i) 

where r(-) stands for the Gamma function. r(?T,) = [n — 1)\ for any positive integer 
n. 

In SIT, when doing p-type integrals for large n, one often encounters integrals 
of sharply peaked Gaussian functions integrated over polytope regions. Since the 
Gaussians are sharply peaked, as long as their peak is not near the boundary of the 
polytope region, the integrals can be easily evaluated approximately in a Gaussian 
approximation which becomes increasingly accurate as n increases. 

Recall that 



oo 



dxe-^"" = J- (197) 



for A > 0. 



Claim 19 Suppose {Qx}\tx ^ pd{Sx), APx = Px — Qx, and A^. >> 1 for all x E 
Then 



j VPx exp ^-^A,.(AP,.)2j 



where \\\ = j-^ ■ (If the Xx are thought of as electrical resistances connected 
in parallel, then An is the equivalent resistance.) 



^See, for example, Ref.[5] for a proof. 
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proof: Let LHS and RHS denote the left hand side and right hand side of Eq. fllQSp . 
One has 



where 



Thus 



LHS 



n|y"''"dAP,j5(^AP,.)exp ^-^A,(APj2j (199a) 
— r (199b) 



J] dAP^ exp (-A,(AP,.)' + ^kAP^) | 



-oo 



— oo 



e ^^11 



X 



(200a) 



l[\e-& I dAP^exp ( -K{AP, - ) \ (200b) 



Pi/S . (201c) 



QED 

Claim 20 Suppose matrix {Axy)\/x^x' has eigenvalues {\x}yx- Suppose {Qx}vx ^ 
pd{Sx), APx = Px ~ Qx, ond Xx » 1 for all x & Sx- Then 

j ^P'- -P (- E AP. AP.) « yjjjgg^ . (202) 

proof: Just diagonalize the matrix Ax^x' and use the previous claim, where now the 

\x are the eigenvalues of A. 

QED 

For {Py\x}\/y £ pd{Sy) for all x G 5^, we define the following integration 
operator: 
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/ T^PyJ^. = n {jy^y\^] n |m E^^l- - l) } • (203) 

This is the same definition as Eq. (H6|) . except for an arbitrary vector {-Py|x(z/k)}vj/ 
instead of just for a p-type {P[j^n|^n](y|x)}vy. 
Note that Eq. (fTMD imphes that 



(iV, - 1)! 



(204) 



Claim 21 Suppose matrix Ay\x ,y'\x' has eigenvalues {\y\x}\ix,y Suppose {Qy\x}\/y G 
pd{Sy), APy\x = Py\x — Qy\x, and Xy\x >> 1 for all x E Sx and y E Sy . Then (using 
Einstein's repeated index summation convention) 



T^Py\x 6Xp ( /\.Py\x-Ay\x ,y'\x' APy'\x'^ 



\ 



det{A) det 



^yi,y2 yi\xi,y2\x' 



(205) 



proof: Let LHS and RHS denote the left hand side and right hand side of Eq. (1205 p 
Let {ujy)y(zSy be a vector with all components equal to one. Then 



LHS 



where 



n 

x,y 

n 



dAPy\x\Yl{6{ujyAPy\x)}e- 

' dkr 
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'^Pyl^^y\=o,y'\=o"^Py'\:.' (206a) 

(206b) 



where 



n 



x,y 



dAPy\x I e-^^y^^^y\- ■ (207a) 

dAPyix le^^^^i-^J/i-.f'i-'^-^^'i-' (207b) 



x,y 



^ 1 
APy\x = APy\x - -kx^Uy^A,^^, . 



(208) 



Thus 
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Thus 



r 



det A 



(209) 



N^Ny 



det A 



n 



2-K 







1 


det A (27r)^s 






det 








\ / \/xi,X2 



RHS . 



(210a) 
(210b) 

(210c) 



QED 

When using many of the integration formulas presented in this appendix, it 
is necessary to calculate the inverse and determinant of a large matrix. I found the 
following formulas can often be helpful in doing this. 

Claim 22 Suppose E is an n x n matrix. Suppose p and q are n component column 
vectors. Suppose 



Then 



A = E+pq^ . 
A-' = E-' - 



1 E-^pq^E-^ 



1 + q^E-^p ' 
det (A) = det(E)(l + q^E-^p) . 



(211) 

(212a) 
(212b) 



proof: To prove Eq. fl212ap . just show that the right hand sides of Eqs. fl21ip and 
f l212ap multiply to one. 

To prove Eq. fl212bp . one may proceed as follows. We will assume A G C'^^^ for 
concreteness. The proof we will give generalizes easily to A's of dimension different 
from 3. Let ejijajs be the totally antisymmetric tensor with 3 indices. We will use 
Einstein summation convention. Let 



k,j 



(213) 
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Then 



det(^) = det{E)det{Sij+PiQj) (214a) 

= det{E)ej^j^j,{Sij, +PiQn)iS2,j^ +P2Qh){h,h +PzQh) (214b) 

= dei{E){l+pjQj) . (214c) 

QED 

Claim 23 Suppose A is an nx n matrix, and < e << 1. Then 

det(l + eA) = l + etr(A) +0(e2) . (215) 

proof: Just diagonalize A. 
QED 

References 

[1] Thomas M. Cover, Joy A. Thomas, Elements of Information Theory (Wiley- 
Interscience, 1991) 

[2] Daphne KoUer, Nir Friedman, Probabilistic Graphical Models, Principles and 
Techniques (MIT Press, 2009) 

[3] Imre Csiszar, Janos Korncr, Information Theory- Coding Theorems for Discrete 
Memoryless Systems (Academic Press, 1981) 

[4] Walter Rudin, Principles of Mathematical Analysis, 3rd ed. (McGraw-Hill, 1976) 

[5] H. Jeffreys, B.S. Jeffreys, Methods of Mathematical Physics, 3rd ed. (Cambridge 
University Press, 1988) 



37 



